Library summary stats

No. of fragments per sample?

##        fragments.per.sample
## mean             44,955,206
## sd                6,398,756
## se                  748,918
## median           43,343,558
## min              35,623,905
## max              64,917,188

Did the no. of fragments per treatment differ?

Boxplots of the # of fragments in each sample per treatment
Statistically test whether no. of fragments / treatment differs

## 
##  Shapiro-Wilk normality test
## 
## data:  log(fragments$read.total)
## W = 0.94823, p-value = 0.00465
## Analysis of Variance Table
## 
## Response: log(read.total)
##           Df  Sum Sq  Mean Sq F value Pr(>F)
## treatment  5 0.12749 0.025497  1.4563 0.2159
## Residuals 67 1.17306 0.017508

How many samples per treatment, tank, etc?

## # A tibble: 6 × 2
##   treatment     n
##   <ord>     <int>
## 1 3_amb        12
## 2 3_low        11
## 3 6_amb        14
## 4 6_low        13
## 5 10_amb       11
## 6 10_low       12
## # A tibble: 3 × 2
##   temperature     n
##   <fct>       <int>
## 1 3              23
## 2 6              27
## 3 10             23
## # A tibble: 2 × 2
##   ph        n
##   <fct> <int>
## 1 amb      37
## 2 low      36
## # A tibble: 21 × 3
## # Groups:   treatment [6]
##    treatment tank      n
##    <ord>     <fct> <int>
##  1 3_amb     19       12
##  2 3_low     13        3
##  3 3_low     14        3
##  4 3_low     21        2
##  5 3_low     22        3
##  6 6_amb     3         4
##  7 6_amb     4         4
##  8 6_amb     17        3
##  9 6_amb     18        3
## 10 6_low     7         3
## # … with 11 more rows

How many genes total in the filtered count matrix?

## [1] "20,975"

What’s the average no. of genes per sample, etc.?

##        genes.per.sample
## mean             20,934
## sd                   18
## se                    2
## median           20,937
## min              20,852
## max              20,955

Total no. of genes identified in each sample

##        temperature  ph treatment tank treatment_tank LOC115539476 LOC115539709
## PCG001          10 amb    10_amb    1       10_amb_1          257          758
## PCG004          10 amb    10_amb    1       10_amb_1          228          741
## PCG010          10 amb    10_amb    1       10_amb_1          317          814
## PCG011          10 amb    10_amb    1       10_amb_1          333          837
## PCG015          10 amb    10_amb    1       10_amb_1          219          677
## PCG017          10 amb    10_amb    2       10_amb_2          270          482
##        LOC115538781 abhd14a acy1 LOC115537228 LOC115537019 LOC115538651
## PCG001          586    1197 1626         2106          659          659
## PCG004          596    1381 1244         2402          544          625
## PCG010          664    1313 1691         2367          552          616
## PCG011          576    1629 1670         2555          911          571
## PCG015          630    1031 1245         2008          564          504
## PCG017          417     774 1224         1845          505          433
##        LOC115538267 kbtbd12 ccdc36 c1h1orf159 LOC115538500 LOC115539723
## PCG001           54    1003     42         37          731         2841
## PCG004          177     743     42         53          948         3579
## PCG010          126     722     32         57          837         3088
## PCG011           74     646     44         71          688         2732
## PCG015          143     758     53         73          623         3212
## PCG017           97     714     20         75          675         4046
##        LOC115537157 usp48 LOC115537633 LOC115539576 atxn7l2
## PCG001            1  1843          254          274     277
## PCG004            7  2320          258          288     324
## PCG010           12  2328          324          281     339
## PCG011           12  1881          259          245     253
## PCG015           10  1872          281          254     277
## PCG017            8  2260          400          284     292
## [1] TRUE

Unsupervised analyses

Heat maps

Heatmap from all gene counts, variance-stabilization transformed

Heat map with top 10% most variable genes, VSD-transformed counts

Hierarchical clustering

NOTE: this was generated by the pheatmap function in the previous code chunk

## [1] TRUE

NOTE: Samples 194 & 137 look like outliers!

PCA’s

PCAs using DESeq2

NOTE: Hover over points to see the sample numbers

This PCA also suggests that two samples - 137 and 194 - are outliers. Consider removing.

PCAs using prcomp

This enables screeplot to ID significant PCs, calculates variance, etc.

## Importance of components:
##                               PC1         PC2         PC3        PC4        PC5
## Variance(eigenvalue)   814.575515 170.8781408 132.5304807 96.9091562 72.7767275
## Proportion of Variance   0.383931   0.0805394   0.0624651  0.0456759  0.0343016
## Cumulative Proportion    0.383931   0.4644704   0.5269356  0.5726114  0.6069130
## Broken-stick value       4.874509   3.8745088   3.3745088  3.0411755  2.7911755
##                               PC6        PC7        PC8        PC9       PC10
## Variance(eigenvalue)   63.3710808 53.0599813 44.2083670 33.2200195 27.5233513
## Proportion of Variance  0.0298685  0.0250086  0.0208366  0.0156575  0.0129725
## Cumulative Proportion   0.6367815  0.6617901  0.6826266  0.6982841  0.7112566
## Broken-stick value      2.5911755  2.4245088  2.2816516  2.1566516  2.0455405
##                              PC11       PC12       PC13       PC14      PC15
## Variance(eigenvalue)   26.3006437 25.1253462 22.3765405 20.8120895 20.270432
## Proportion of Variance  0.0123962  0.0118422  0.0105467  0.0098093  0.009554
## Cumulative Proportion   0.7236528  0.7354950  0.7460417  0.7558510  0.765405
## Broken-stick value      1.9455405  1.8546314  1.7712981  1.6943750  1.622947
##                              PC16       PC17       PC18       PC19       PC20
## Variance(eigenvalue)   18.3470894 17.8624428 17.1131327 16.3363877 15.8122209
## Proportion of Variance  0.0086475  0.0084190  0.0080659  0.0076998  0.0074527
## Cumulative Proportion   0.7740525  0.7824715  0.7905374  0.7982371  0.8056899
## Broken-stick value      1.5562798  1.4937798  1.4349563  1.3794007  1.3267691
##                              PC21       PC22       PC23       PC24       PC25
## Variance(eigenvalue)   15.0447656 13.3316321 13.1757606 12.7745293 12.0961356
## Proportion of Variance  0.0070910  0.0062836  0.0062101  0.0060210  0.0057012
## Cumulative Proportion   0.8127809  0.8190644  0.8252745  0.8312955  0.8369967
## Broken-stick value      1.2767691  1.2291501  1.1836955  1.1402173  1.0985506
##                              PC26       PC27       PC28       PC29       PC30
## Variance(eigenvalue)   11.9222014 11.6971429 11.1118224 10.7528323 10.3148264
## Proportion of Variance  0.0056192  0.0055132  0.0052373  0.0050681  0.0048617
## Cumulative Proportion   0.8426159  0.8481291  0.8533664  0.8584345  0.8632962
## Broken-stick value      1.0585506  1.0200891  0.9830520  0.9473377  0.9128550
##                             PC31      PC32      PC33      PC34      PC35
## Variance(eigenvalue)   9.8561080 9.8032776 9.6599644 9.4759712 9.1288228
## Proportion of Variance 0.0046454 0.0046205 0.0045530 0.0044663 0.0043027
## Cumulative Proportion  0.8679416 0.8725622 0.8771152 0.8815814 0.8858841
## Broken-stick value     0.8795217 0.8472636 0.8160136 0.7857106 0.7562988
##                             PC36      PC37      PC38      PC39      PC40
## Variance(eigenvalue)   8.8696675 8.7441125 8.6088049 8.3940185 8.2364842
## Proportion of Variance 0.0041805 0.0041213 0.0040576 0.0039563 0.0038821
## Cumulative Proportion  0.8900646 0.8941859 0.8982435 0.9021998 0.9060819
## Broken-stick value     0.7277274 0.6999496 0.6729226 0.6466068 0.6209657
##                             PC41      PC42      PC43      PC44      PC45
## Variance(eigenvalue)   8.0999799 7.9089651 7.8861890 7.7561356 7.5987719
## Proportion of Variance 0.0038177 0.0037277 0.0037170 0.0036557 0.0035815
## Cumulative Proportion  0.9098996 0.9136273 0.9173443 0.9210000 0.9245815
## Broken-stick value     0.5959657 0.5715755 0.5477660 0.5245102 0.5017829
##                             PC46      PC47      PC48      PC49      PC50
## Variance(eigenvalue)   7.4077481 7.1665983 7.1330399 6.9318846 6.8485534
## Proportion of Variance 0.0034915 0.0033778 0.0033620 0.0032672 0.0032279
## Cumulative Proportion  0.9280729 0.9314507 0.9348127 0.9380799 0.9413078
## Broken-stick value     0.4795607 0.4578215 0.4365449 0.4157116 0.3953034
##                             PC51      PC52      PC53      PC54      PC55
## Variance(eigenvalue)   6.7220797 6.6352222 6.4961766 6.4286927 6.3415097
## Proportion of Variance 0.0031683 0.0031274 0.0030618 0.0030300 0.0029889
## Cumulative Proportion  0.9444761 0.9476035 0.9506653 0.9536953 0.9566842
## Broken-stick value     0.3753034 0.3556956 0.3364648 0.3175969 0.2990784
##                             PC56      PC57      PC58      PC59      PC60
## Variance(eigenvalue)   6.1071900 6.0698927 6.0035330 5.8990444 5.8974722
## Proportion of Variance 0.0028785 0.0028609 0.0028296 0.0027804 0.0027796
## Cumulative Proportion  0.9595627 0.9624236 0.9652532 0.9680336 0.9708132
## Broken-stick value     0.2808966 0.2630394 0.2454956 0.2282542 0.2113050
##                             PC61      PC62      PC63      PC64      PC65
## Variance(eigenvalue)   5.7345874 5.5844864 5.5057888 5.4439828 5.3123272
## Proportion of Variance 0.0027029 0.0026321 0.0025950 0.0025659 0.0025038
## Cumulative Proportion  0.9735161 0.9761482 0.9787432 0.9813091 0.9838130
## Broken-stick value     0.1946384 0.1782449 0.1621159 0.1462429 0.1306179
##                             PC66      PC67      PC68      PC69      PC70
## Variance(eigenvalue)   5.2219647 5.0468673 5.0410113 4.8607550 4.7943737
## Proportion of Variance 0.0024613 0.0023787 0.0023760 0.0022910 0.0022597
## Cumulative Proportion  0.9862742 0.9886530 0.9910289 0.9933199 0.9955796
## Broken-stick value     0.1152333 0.1000817 0.0851564 0.0704505 0.0559577
##                             PC71      PC72      PC73
## Variance(eigenvalue)   4.7460929 4.6324740 0.0000000
## Proportion of Variance 0.0022370 0.0021834 0.0000000
## Cumulative Proportion  0.9978166 1.0000000 1.0000000
## Broken-stick value     0.0416720 0.0275875 0.0136986
## Importance of components:
## [1] 46.44704

Heatmap of the sample-to-sample distances

Another use of the transformed data is sample clustering. Here, we apply the dist function to the transpose of the transformed count matrix to get sample-to-sample distances.

A heatmap of this distance matrix gives us an overview over similarities and dissimilarities between samples. We have to provide a hierarchical clustering hc to the heatmap function based on the sample distances, or else the heatmap function would calculate a clustering based on the distances between the rows/columns of the distance matrix.

## [1] TRUE

Differential Expression Analysis

This is the core DESeq2 analysis!

Identify differentially expressed genes among contrasts

pH contrasts

## [1] "Comparison: Ambient pH vs. Low pH, 3C"
## 
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 0, 0%
## LFC < 0 (down)     : 2, 0.0095%
## outliers [1]       : 0, 0%
## low counts [2]     : 0, 0%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) by pH, 3C: 2"
## [1] "Comparison: Ambient pH vs. Low pH, 6C"
## 
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 2709, 13%
## LFC < 0 (down)     : 1941, 9.3%
## outliers [1]       : 0, 0%
## low counts [2]     : 1220, 5.8%
## (mean count < 24)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) by pH, 6C: 4650"
## [1] "Comparison: Ambient pH vs. Low pH, 10C"
## 
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 54, 0.26%
## LFC < 0 (down)     : 59, 0.28%
## outliers [1]       : 0, 0%
## low counts [2]     : 4067, 19%
## (mean count < 117)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) by pH, 10C: 113"

Temperature contrasts

Comparing temperatures within ambient pH
## [1] "Comparison: 3C vs. 6C, ambient pH"
## 
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 1241, 5.9%
## LFC < 0 (down)     : 1404, 6.7%
## outliers [1]       : 0, 0%
## low counts [2]     : 0, 0%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) between 6C & 10C, ambient pH: 2645"
## [1] "Comparison: 3C vs. 10C, ambient pH"
## 
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 5518, 26%
## LFC < 0 (down)     : 5941, 28%
## outliers [1]       : 0, 0%
## low counts [2]     : 0, 0%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) between 3C & 10C, ambient pH: 11459"
## [1] "Comparison: 6C vs. 10C, ambient pH"
## 
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 4720, 23%
## LFC < 0 (down)     : 4328, 21%
## outliers [1]       : 0, 0%
## low counts [2]     : 0, 0%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) between 6C & 10C, ambient pH: 9048"
Comparing temperatures within low pH
## [1] "Comparison: 3C vs. 6C, low pH"
## 
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 4281, 20%
## LFC < 0 (down)     : 4381, 21%
## outliers [1]       : 0, 0%
## low counts [2]     : 0, 0%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) between 6C & 10C, low pH: 8662"
## [1] "Comparison: 3C vs. 10C, low pH"
## 
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 4943, 24%
## LFC < 0 (down)     : 4938, 24%
## outliers [1]       : 0, 0%
## low counts [2]     : 0, 0%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) between 3C & 10C, low pH: 9881"
## [1] "Comparison: 6C vs. 10C, low pH"
## 
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 1277, 6.1%
## LFC < 0 (down)     : 1067, 5.1%
## outliers [1]       : 0, 0%
## low counts [2]     : 0, 0%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) between 6C & 10C, low pH: 2344"

Number of genes analyzed

## [1] 20975

Summary stats for the number of Differentially Expressed Genes (DEGs)

Bar plot of the number of DEGs identified in each contrast

Number of unique DEGs

Among pH, for each temperature?
## [1] 4741
Among temperatures, ambient pH?
## [1] 13052
Among temperatures, low pH?
## [1] 12480
What % of all examined genes are DEGs among pH treatments?
## [1] 22.6
What % of all examined genes are DEGs among temperature treatments (ambient pH only)?
## [1] 62.2
What % of all examined genes are DEGs among temperature treatments (low pH only)?
## [1] 59.5

Heat maps of DEGs

Single heatmap with all DEGs among any pH treatment

NOTE: There are 4,525 DEGs among pH treatment in the 6C group, but only 2 and 58 among the 3C and 10C groups, respectively. This is visible in the heat map- there are clear differences in expression in the 6_amb (green) and 6_low (yellow) treatments.

## [1] TRUE

Single heatmap with all DEGs among any temperature treatment, ambient pH only

## [1] TRUE

Single heatmap with all DEGs among any temperature treatment, low pH only

## [1] TRUE

Venn Diagrams of DEGs by pairwise contrast

## [1] "Amb vs. Low pH @3C"   "Amb vs. Low pH @6C"   "Amb vs. Low pH @ 10C"
## [4] "3 vs. 6C @Amb pH"     "3 vs. 10C @Amb pH"    "6 vs. 10C @Amb pH"   
## [7] "3 vs. 6C @Low pH"     "3 vs. 10C @Low pH"    "6 vs. 10C @Low pH"

pH contrasts

Same temperature contrasts, compare low vs. ambient pH conditions

DAVID enrichment analysis

It is a little clunky, but the DAVID functional analysis tool is one of the best ways to perform enrichment analysis (that I have found to date). To use, you copy a list of Uniprot IDs (or other gene identifier) to clipboard, then paste it into their tool. You then do the same for all genes in the analysis, providing an accurate background list of genes. Here is code to copy all genes, then upregulated and downregulated DEGs identified by each contrast.

Bubble plot of enriched Biological Processes

pH contrasts for each temperature

Temperature contrasts for each pH